1 Case study 1: ISLR::Auto data

This will be the last part of the Auto data from ISLR. The original data contains 408 observations about cars. It has some similarity as the Cars data that we use in our lectures. To get the data, first install the package ISLR. The data set Auto should be loaded automatically. We use this case to go through methods learned so far.

Final modelling question: We want to explore the effects of each feature as best as possible.

1.1 Preparing variables:

  1. You may explore the possibility of variable transformations. We normally do not suggest to transform \(x\) for the purpose of interpretation. You may consider to transform \(y\) to either correct the violation of the linear model assumptions or if you feel a transformation of \(y\) makes more sense from some theory. In this case we suggest you to look into GPM=1/MPG. Compare residual plots of MPG or GPM as responses and see which one might yield a more satisfactory patterns.
Auto <- ISLR::Auto
AutoCopy <- Auto
AutoCopy$mpg <- 1/Auto$mpg
colnames(AutoCopy)[which(colnames(AutoCopy) == "mpg")] <- "gpm"
model1 <- lm(mpg ~ horsepower + weight + year + as.factor(Auto$origin), data = Auto)
model2 <- lm(gpm ~ horsepower + weight + year + as.factor(AutoCopy$origin), data = AutoCopy)

res1 <- resid(model1) 
plot(fitted(model1), res1)
abline(0,0)

res2 <- resid(model2) 
plot(fitted(model2), res2) 
abline(0,0)

In addition, can you provide some background knowledge to support the notion: it makes more sense to model GPM?

From visual inspection of the two plots of the residuals, it is clear that there was uncaptured structure in the MPG fit while this uncaptured trend was somewhat picked up in the GPM plot. The residual plot in the former model is unbalanced: errors are all positive when y is small. The latter model with a reciprocal transformation solves this anomaly. The transformation makes sense because linear increases in weight would lead to increased fuel consumption, not fuel efficiency.

When using a linear or higher power model, it makes sense to have the response variable directly related, not inversely related.

  1. You may also explore by adding interactions and higher order terms. The model(s) should be as parsimonious (simple) as possible, unless the gain in accuracy is significant from your point of view.
model3 <- lm(gpm ~ horsepower + weight + year + as.factor(AutoCopy$origin) + 
               (horsepower + weight + year)^2 + 
               I(horsepower^2) + I(horsepower^3) +
               I(weight^2) + I(weight^3) +
               I(year^2) + I(year^3)
               , data = AutoCopy)

model3.exh <- regsubsets(gpm ~ horsepower + weight + year + 
                           as.factor(AutoCopy$origin) + 
               (horsepower + weight + year)^2 + 
               I(horsepower^2) + I(horsepower^3) +
               I(weight^2) + I(weight^3) +
               I(year^2) + I(year^3)
               , data = AutoCopy,  nvmax=25, method="exhaustive")

f.e <- summary(model3.exh)
names(f.e)
## [1] "which"  "rsq"    "rss"    "adjr2"  "cp"     "bic"    "outmat" "obj"
str(f.e)
## List of 8
##  $ which : logi [1:14, 1:15] TRUE TRUE TRUE TRUE TRUE TRUE ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:14] "1" "2" "3" "4" ...
##   .. ..$ : chr [1:15] "(Intercept)" "horsepower" "weight" "year" ...
##  $ rsq   : num [1:14] 0.785 0.878 0.883 0.885 0.887 ...
##  $ rss   : num [1:14] 0.0232 0.0132 0.0127 0.0125 0.0123 ...
##  $ adjr2 : num [1:14] 0.785 0.877 0.882 0.883 0.885 ...
##  $ cp    : num [1:14] 386.2 54.7 39.1 34.1 28.3 ...
##  $ bic   : num [1:14] -591 -806 -816 -817 -818 ...
##  $ outmat: chr [1:14, 1:14] " " " " " " " " ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:14] "1  ( 1 )" "2  ( 1 )" "3  ( 1 )" "4  ( 1 )" ...
##   .. ..$ : chr [1:14] "horsepower" "weight" "year" "as.factor(AutoCopy$origin)2" ...
##  $ obj   :List of 28
##   ..$ np       : int 15
##   ..$ nrbar    : int 105
##   ..$ d        : num [1:15] 3.92e+02 1.23e+08 6.05e+01 2.94e+10 4.69e+11 ...
##   ..$ rbar     : num [1:105] 5.79e+03 2.02e-01 1.24e+04 2.25e+05 3.39e+05 ...
##   ..$ thetab   : num [1:15] 4.78e-02 -1.66e-05 -1.28e-02 1.13e-06 1.69e-07 ...
##   ..$ first    : int 2
##   ..$ last     : int 15
##   ..$ vorder   : int [1:15] 1 11 6 7 15 13 3 12 4 10 ...
##   ..$ tol      : num [1:15] 9.90e-09 6.53e-05 9.65e-09 2.44e-04 3.22e-03 ...
##   ..$ rss      : num [1:15] 0.1083 0.0743 0.0644 0.0267 0.0133 ...
##   ..$ bound    : num [1:15] 0.1083 0.0232 0.0132 0.0127 0.0125 ...
##   ..$ nvmax    : int 15
##   ..$ ress     : num [1:15, 1] 0.1083 0.0232 0.0132 0.0127 0.0125 ...
##   ..$ ir       : int 15
##   ..$ nbest    : int 1
##   ..$ lopt     : int [1:120, 1] 1 1 13 1 3 15 1 14 15 3 ...
##   ..$ il       : int 120
##   ..$ ier      : int 0
##   ..$ xnames   : chr [1:15] "(Intercept)" "horsepower" "weight" "year" ...
##   ..$ method   : chr "exhaustive"
##   ..$ force.in : Named logi [1:15] TRUE FALSE FALSE FALSE FALSE FALSE ...
##   .. ..- attr(*, "names")= chr [1:15] "" "horsepower" "weight" "year" ...
##   ..$ force.out: Named logi [1:15] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   .. ..- attr(*, "names")= chr [1:15] "" "horsepower" "weight" "year" ...
##   ..$ sserr    : num 0.0113
##   ..$ intercept: logi TRUE
##   ..$ lindep   : logi [1:15] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   ..$ nullrss  : num 0.108
##   ..$ nn       : int 392
##   ..$ call     : language regsubsets.formula(gpm ~ horsepower + weight + year + as.factor(AutoCopy$origin) +      (horsepower + weight + ye| __truncated__ ...
##   ..- attr(*, "class")= chr "regsubsets"
##  - attr(*, "class")= chr "summary.regsubsets"
plot(f.e$cp, xlab="Number of predictors", 
     ylab="Cp", col="red", pch=16)

model4 <- lm(gpm ~ horsepower + weight + year + as.factor(AutoCopy$origin) + 
               (horsepower + weight + year)^2 + 
               I(horsepower^2) + I(horsepower^3) +
               I(weight^2) + I(weight^3) 
               , data = AutoCopy)

model4.exh <- regsubsets(gpm ~ horsepower + weight + year + 
                           as.factor(AutoCopy$origin) + 
               (horsepower + weight + year)^2 + 
               I(horsepower^2) + I(horsepower^3) +
               I(weight^2) + I(weight^3)
               , data = AutoCopy,  nvmax=25, method="exhaustive")

f.e <- summary(model4.exh)
names(f.e)
## [1] "which"  "rsq"    "rss"    "adjr2"  "cp"     "bic"    "outmat" "obj"
str(f.e)
## List of 8
##  $ which : logi [1:12, 1:13] TRUE TRUE TRUE TRUE TRUE TRUE ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:12] "1" "2" "3" "4" ...
##   .. ..$ : chr [1:13] "(Intercept)" "horsepower" "weight" "year" ...
##  $ rsq   : num [1:12] 0.785 0.878 0.883 0.885 0.886 ...
##  $ rss   : num [1:12] 0.0232 0.0132 0.0127 0.0125 0.0124 ...
##  $ adjr2 : num [1:12] 0.785 0.877 0.882 0.883 0.884 ...
##  $ cp    : num [1:12] 344.72 31.11 16.38 11.84 9.79 ...
##  $ bic   : num [1:12] -591 -806 -816 -817 -815 ...
##  $ outmat: chr [1:12, 1:12] " " " " " " " " ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:12] "1  ( 1 )" "2  ( 1 )" "3  ( 1 )" "4  ( 1 )" ...
##   .. ..$ : chr [1:12] "horsepower" "weight" "year" "as.factor(AutoCopy$origin)2" ...
##  $ obj   :List of 28
##   ..$ np       : int 13
##   ..$ nrbar    : int 78
##   ..$ d        : num [1:13] 3.92e+02 3.09e+23 1.65e+07 1.84e+12 7.58e+07 ...
##   ..$ rbar     : num [1:78] 3.31e+10 2.98e+03 3.39e+05 7.88e+03 2.25e+05 ...
##   ..$ thetab   : num [1:13] 4.78e-02 5.16e-13 1.31e-05 3.95e-08 -5.66e-06 ...
##   ..$ first    : int 2
##   ..$ last     : int 13
##   ..$ vorder   : int [1:13] 1 10 3 11 12 13 4 7 8 6 ...
##   ..$ tol      : num [1:13] 9.90e-09 8.27e+02 4.10e-05 6.49e-03 1.10e-04 ...
##   ..$ rss      : num [1:13] 0.1083 0.0259 0.0231 0.0203 0.0178 ...
##   ..$ bound    : num [1:13] 0.1083 0.0232 0.0132 0.0127 0.0125 ...
##   ..$ nvmax    : int 13
##   ..$ ress     : num [1:13, 1] 0.1083 0.0232 0.0132 0.0127 0.0125 ...
##   ..$ ir       : int 13
##   ..$ nbest    : int 1
##   ..$ lopt     : int [1:91, 1] 1 1 11 1 13 3 1 13 3 12 ...
##   ..$ il       : int 91
##   ..$ ier      : int 0
##   ..$ xnames   : chr [1:13] "(Intercept)" "horsepower" "weight" "year" ...
##   ..$ method   : chr "exhaustive"
##   ..$ force.in : Named logi [1:13] TRUE FALSE FALSE FALSE FALSE FALSE ...
##   .. ..- attr(*, "names")= chr [1:13] "" "horsepower" "weight" "year" ...
##   ..$ force.out: Named logi [1:13] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   .. ..- attr(*, "names")= chr [1:13] "" "horsepower" "weight" "year" ...
##   ..$ sserr    : num 0.012
##   ..$ intercept: logi TRUE
##   ..$ lindep   : logi [1:13] FALSE FALSE FALSE FALSE FALSE FALSE ...
##   ..$ nullrss  : num 0.108
##   ..$ nn       : int 392
##   ..$ call     : language regsubsets.formula(gpm ~ horsepower + weight + year + as.factor(AutoCopy$origin) +      (horsepower + weight + ye| __truncated__ ...
##   ..- attr(*, "class")= chr "regsubsets"
##  - attr(*, "class")= chr "summary.regsubsets"
plot(f.e$cp, xlab="Number of predictors", 
     ylab="Cp", col="red", pch=16)

  1. Use Mallow’s \(C_p\) or BIC to select the model.

Using the “elbow” rule, the cp appears to level off around 6, so we will select 6 predictors as the chosen model.

After inspecting the variables chosen by the size 6 optimal model, the 3rd and fourth terms were found to be year squared and year cubed, one having a positive coefficient and the other having a negative coefficient. This is not easily interpreted (why does year squared decrease gpm but year cubed increase?)

For easier interpret ability, in the final mode, we elect to exclude higher order terms on year. After re-running the CP selection process, the new optimal model was found to be of size four (using the elbow rule).

1.2 Describe the final model and its accuracy. Include diagnostic plots with particular focus on the model residuals.

  • Summarize the effects found.
  • Predict the mpg of a car that is: built in 1983, in the US, red, 180 inches long, 8 cylinders, 350 displacement, 260 as horsepower, and weighs 4,000 pounds. Give a 95% CI.
  • Any suggestions as to how to improve the quality of the study?

After eliminating year higher order terms due to interpretability issues, the optimal model found has terms

  • weight increasing gpm
  • horsepower:weight decreasing gpm
  • horsepower:year increasing gpm
  • weight:year decreasing gpm

This model is far more interpretable, with weight and horsepower*year having easy to interpret stories. As cars get heavier, they use more gas. As cars get newer and more powerful, they consume more gpm. As cars get heavier and newer, they consume less gpm compared to an older car of similar weight.

coef(model3.exh,6)
##     (Intercept)          weight            year       I(year^2)       I(year^3) 
##       -1.11e+01        4.35e-05        4.35e-01       -5.65e-03        2.44e-05 
## horsepower:year     weight:year 
##        1.16e-06       -4.20e-07
coef(model4.exh,4)
##       (Intercept)            weight horsepower:weight   horsepower:year 
##         -4.53e-03          5.60e-05         -2.43e-08          2.08e-06 
##       weight:year 
##         -5.43e-07
fit.final <- lm(gpm ~ weight  + horsepower:weight + horsepower:year + weight:year, AutoCopy )   
par(mfrow=c(1,2))
plot(fit.final, 1)
plot(fit.final, 2) 

newcar <- AutoCopy [1,] # Get the right format and the variable names
newcar$year <- 83
newcar$origin <- as.factor(1)
newcar$cylinders <- 8
newcar$displacement <-350
newcar$horsepower <- 260
newcar$weight <- 4000

newcar.gpm <- predict(fit.final, newcar, interval="confidence", se.fit=TRUE) 
newcar.gpm
## $fit
##      fit    lwr    upr
## 1 0.0585 0.0532 0.0638
## 
## $se.fit
## [1] 0.0027
## 
## $df
## [1] 387
## 
## $residual.scale
## [1] 0.00568
newcar.mpg <- 1/newcar.gpm$fit
newcar.mpg
##    fit  lwr  upr
## 1 17.1 18.8 15.7

The predicted MPG confidence interval ranges from 15.7 to 18.8mpg, with a fitted value of 17.1 mpg.

How we can improve the model?

Using principle components could possibly lead to a better “fit” at the expense of interpret ability. If we seek to retain a high level of interpret ability, we have to accept a bit of modeling error.

Perhaps most important, using clustering analysis to cluster similar automobiles could lead to a better result, since there are likely non-linear and non-polynomial influences of variables (no power of weight will pick up whether the car is a pickup truck or not, etc.)

2 Case study 2: COVID

# county-level socialeconomic information
county_data <- fread("data/covid_county.csv")
# county-level COVID case and death
covid_rate <- fread("data/covid_rates.csv")
# county-level lockdown dates
covid_intervention <- fread("data/covid_intervention.csv")

2.1 understand the data

view(dfSummary(covid_rate),method = "render")

Data Frame Summary

covid_rate

Dimensions: 1008984 x 8
Duplicates: 0
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 FIPS [integer]
Mean (sd) : 30572 (15054)
min ≤ med ≤ max:
1001 ≤ 29179 ≤ 56045
IQR (CV) : 26072 (0.5)
3108 distinct values 1008984 (100.0%) 0 (0.0%)
2 date [IDate, Date]
min : 2020-01-21
med : 2020-09-11
max : 2021-02-20
range : 1y 0m 30d
397 distinct values 1008984 (100.0%) 0 (0.0%)
3 County [character]
1. Washington
2. Jefferson
3. Franklin
4. Jackson
5. Lincoln
6. Madison
7. Montgomery
8. Union
9. Clay
10. Marion
[ 1809 others ]
10358(1.0%)
8669(0.9%)
8282(0.8%)
7839(0.8%)
7801(0.8%)
6630(0.7%)
6035(0.6%)
5926(0.6%)
5862(0.6%)
5613(0.6%)
935969(92.8%)
1008984 (100.0%) 0 (0.0%)
4 State [character]
1. Texas
2. Georgia
3. Virginia
4. Kentucky
5. Missouri
6. Illinois
7. North Carolina
8. Kansas
9. Iowa
10. Tennessee
[ 39 others ]
79955(7.9%)
53032(5.3%)
43638(4.3%)
38890(3.9%)
36684(3.6%)
33241(3.3%)
33108(3.3%)
32113(3.2%)
32074(3.2%)
31543(3.1%)
594706(58.9%)
1008984 (100.0%) 0 (0.0%)
5 cum_cases [integer]
Mean (sd) : 2933 (14747)
min ≤ med ≤ max:
0 ≤ 366 ≤ 1179633
IQR (CV) : 1503 (5)
39874 distinct values 1008984 (100.0%) 0 (0.0%)
6 cum_deaths [integer]
Mean (sd) : 65.6 (323)
min ≤ med ≤ max:
0 ≤ 7 ≤ 19793
IQR (CV) : 30 (4.9)
4494 distinct values 1008984 (100.0%) 0 (0.0%)
7 week [integer]
Mean (sd) : 34.1 (13.6)
min ≤ med ≤ max:
1 ≤ 34 ≤ 57
IQR (CV) : 23 (0.4)
57 distinct values 1008984 (100.0%) 0 (0.0%)
8 TotalPopEst2019 [integer]
Mean (sd) : 112405 (356633)
min ≤ med ≤ max:
169 ≤ 27772 ≤ 1e+07
IQR (CV) : 62056 (3.2)
3057 distinct values 1008984 (100.0%) 0 (0.0%)

Generated by summarytools 1.0.0 (R version 4.1.2)
2022-02-27

view(dfSummary(county_data),method = "render")

Data Frame Summary

county_data

Dimensions: 3279 x 208
Duplicates: 0
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
1 FIPS [integer]
Mean (sd) : 31342 (16341)
min ≤ med ≤ max:
0 ≤ 30019 ≤ 72153
IQR (CV) : 27082 (0.5)
3278 distinct values 3279 (100.0%) 0 (0.0%)
2 State [character]
1. TX
2. GA
3. VA
4. KY
5. MO
6. KS
7. IL
8. NC
9. IA
10. TN
[ 43 others ]
255(7.8%)
160(4.9%)
135(4.1%)
121(3.7%)
116(3.5%)
106(3.2%)
103(3.1%)
101(3.1%)
100(3.0%)
96(2.9%)
1986(60.6%)
3279 (100.0%) 0 (0.0%)
3 County [character]
1. Washington
2. Franklin
3. Jefferson
4. Jackson
5. Lincoln
6. Madison
7. Clay
8. Montgomery
9. Union
10. Marion
[ 1937 others ]
32(1.0%)
26(0.8%)
26(0.8%)
24(0.7%)
24(0.7%)
20(0.6%)
18(0.5%)
18(0.5%)
18(0.5%)
17(0.5%)
3056(93.2%)
3279 (100.0%) 0 (0.0%)
4 MedHHInc [integer]
Mean (sd) : 52945 (13879)
min ≤ med ≤ max:
25385 ≤ 50748 ≤ 140382
IQR (CV) : 15237 (0.3)
3081 distinct values 3193 (97.4%) 86 (2.6%)
5 PerCapitaInc [integer]
Mean (sd) : 26720 (6948)
min ≤ med ≤ max:
5974 ≤ 26169 ≤ 72832
IQR (CV) : 7553 (0.3)
3021 distinct values 3273 (99.8%) 6 (0.2%)
6 PovertyUnder18Pct [numeric]
Mean (sd) : 21 (8.9)
min ≤ med ≤ max:
2.5 ≤ 20.1 ≤ 68.3
IQR (CV) : 11.7 (0.4)
425 distinct values 3193 (97.4%) 86 (2.6%)
7 PovertyAllAgesPct [numeric]
Mean (sd) : 15.1 (6.1)
min ≤ med ≤ max:
2.6 ≤ 14.1 ≤ 54
IQR (CV) : 7.5 (0.4)
317 distinct values 3193 (97.4%) 86 (2.6%)
8 Deep_Pov_All [numeric]
Mean (sd) : 7.1 (4.6)
min ≤ med ≤ max:
0 ≤ 6.2 ≤ 44
IQR (CV) : 3.7 (0.6)
3270 distinct values 3273 (99.8%) 6 (0.2%)
9 Deep_Pov_Children [numeric]
Mean (sd) : 10.2 (7.5)
min ≤ med ≤ max:
0 ≤ 8.7 ≤ 69.2
IQR (CV) : 7.1 (0.7)
3222 distinct values 3272 (99.8%) 7 (0.2%)
10 PovertyUnder18Num [integer]
Mean (sd) : 12212 (235886)
min ≤ med ≤ max:
5 ≤ 1200 ≤ 1.3e+07
IQR (CV) : 2463 (19.3)
2264 distinct values 3193 (97.4%) 86 (2.6%)
11 PovertyAllAgesNum [integer]
Mean (sd) : 39323 (758374)
min ≤ med ≤ max:
5 ≤ 3876 ≤ 41852315
IQR (CV) : 8253 (19.3)
2803 distinct values 3193 (97.4%) 86 (2.6%)
12 PopChangeRate1819 [numeric]
Mean (sd) : 0.1 (1.3)
min ≤ med ≤ max:
-15.5 ≤ 0 ≤ 14.2
IQR (CV) : 1.2 (13.3)
2182 distinct values 3195 (97.4%) 84 (2.6%)
13 PopChangeRate1019 [numeric]
Mean (sd) : 0.7 (8.9)
min ≤ med ≤ max:
-33.5 ≤ -0.6 ≤ 134
IQR (CV) : 8.9 (12.7)
3058 distinct values 3273 (99.8%) 6 (0.2%)
14 TotalPopEst2019 [integer]
Mean (sd) : 301847 (5867536)
min ≤ med ≤ max:
86 ≤ 26687 ≤ 3.28e+08
IQR (CV) : 60369 (19.4)
3216 distinct values 3273 (99.8%) 6 (0.2%)
15 NetMigrationRate1019 [numeric]
Mean (sd) : 0 (7.6)
min ≤ med ≤ max:
-32.2 ≤ -1.1 ≤ 116
IQR (CV) : 7.2 (1427)
2937 distinct values 3194 (97.4%) 85 (2.6%)
16 NaturalChangeRate1019 [numeric]
Mean (sd) : 1 (3.7)
min ≤ med ≤ max:
-11 ≤ 0.6 ≤ 23.1
IQR (CV) : 4.4 (3.5)
2815 distinct values 3194 (97.4%) 85 (2.6%)
17 Net_International_Migration_Rate_2010_2019 [numeric]
Mean (sd) : 0.9 (1.6)
min ≤ med ≤ max:
-1.2 ≤ 0.4 ≤ 21.3
IQR (CV) : 1 (1.8)
1801 distinct values 3194 (97.4%) 85 (2.6%)
18 PopChangeRate0010 [numeric]
Mean (sd) : 5.2 (13)
min ≤ med ≤ max:
-46.6 ≤ 3.2 ≤ 110
IQR (CV) : 12.7 (2.5)
2197 distinct values 3267 (99.6%) 12 (0.4%)
19 NetMigrationRate0010 [numeric]
Mean (sd) : 1 (11.1)
min ≤ med ≤ max:
-57.4 ≤ -0.3 ≤ 85.2
IQR (CV) : 10.5 (11.2)
2053 distinct values 3184 (97.1%) 95 (2.9%)
20 NaturalChangeRate0010 [numeric]
Mean (sd) : 2.8 (4.7)
min ≤ med ≤ max:
-14.5 ≤ 2.4 ≤ 26.4
IQR (CV) : 5.5 (1.7)
1457 distinct values 3184 (97.1%) 95 (2.9%)
21 Immigration_Rate_2000_2010 [numeric]
Mean (sd) : 1.3 (1.8)
min ≤ med ≤ max:
-5.2 ≤ 0.6 ≤ 17.7
IQR (CV) : 1.4 (1.5)
3161 distinct values 3184 (97.1%) 95 (2.9%)
22 PopDensity2010 [numeric]
Mean (sd) : 285 (1717)
min ≤ med ≤ max:
0 ≤ 47 ≤ 69468
IQR (CV) : 114 (6)
2967 distinct values 3273 (99.8%) 6 (0.2%)
23 Under18Pct2010 [numeric]
Mean (sd) : 23.5 (3.3)
min ≤ med ≤ max:
0 ≤ 23.4 ≤ 41.6
IQR (CV) : 3.6 (0.1)
1169 distinct values 3273 (99.8%) 6 (0.2%)
24 Age65AndOlderPct2010 [numeric]
Mean (sd) : 15.8 (4.1)
min ≤ med ≤ max:
3.5 ≤ 15.4 ≤ 43.4
IQR (CV) : 5 (0.3)
1381 distinct values 3273 (99.8%) 6 (0.2%)
25 WhiteNonHispanicPct2010 [numeric]
Mean (sd) : 76.3 (22.9)
min ≤ med ≤ max:
0.2 ≤ 85 ≤ 99.2
IQR (CV) : 28.7 (0.3)
2305 distinct values 3273 (99.8%) 6 (0.2%)
26 BlackNonHispanicPct2010 [numeric]
Mean (sd) : 8.6 (14.3)
min ≤ med ≤ max:
0 ≤ 1.8 ≤ 85.4
IQR (CV) : 9.1 (1.7)
1404 distinct values 3273 (99.8%) 6 (0.2%)
27 AsianNonHispanicPct2010 [numeric]
Mean (sd) : 1.2 (2.5)
min ≤ med ≤ max:
0 ≤ 0.5 ≤ 43
IQR (CV) : 0.7 (2.2)
499 distinct values 3273 (99.8%) 6 (0.2%)
28 NativeAmericanNonHispanicPct2010 [numeric]
Mean (sd) : 1.8 (7.5)
min ≤ med ≤ max:
0 ≤ 0.3 ≤ 95
IQR (CV) : 0.4 (4.1)
485 distinct values 3273 (99.8%) 6 (0.2%)
29 HispanicPct2010 [numeric]
Mean (sd) : 10.5 (19)
min ≤ med ≤ max:
0 ≤ 3.5 ≤ 99.7
IQR (CV) : 7.5 (1.8)
1440 distinct values 3273 (99.8%) 6 (0.2%)
30 MultipleRacePct2010 [numeric]
Mean (sd) : 2 (1.6)
min ≤ med ≤ max:
0.1 ≤ 1.6 ≤ 29.5
IQR (CV) : 1.2 (0.8)
538 distinct values 3273 (99.8%) 6 (0.2%)
31 NonHispanicWhitePopChangeRate0010 [numeric]
Mean (sd) : 0.5 (13.8)
min ≤ med ≤ max:
-94.3 ≤ -0.7 ≤ 149
IQR (CV) : 12.3 (25.7)
2200 distinct values 3267 (99.6%) 12 (0.4%)
32 NonHispanicBlackPopChangeRate0010 [numeric]
Mean (sd) : 69.6 (485)
min ≤ med ≤ max:
-100 ≤ 13.6 ≤ 21950
IQR (CV) : 64.6 (7)
2444 distinct values 3206 (97.8%) 73 (2.2%)
33 NonHispanicAsianPopChangeRate0010 [numeric]
Mean (sd) : 74.2 (129)
min ≤ med ≤ max:
-100 ≤ 47.3 ≤ 2900
IQR (CV) : 73.8 (1.7)
2257 distinct values 3228 (98.4%) 51 (1.6%)
34 NonHispanicNativeAmericanPopChangeRate0010 [numeric]
Mean (sd) : 17.7 (66.4)
min ≤ med ≤ max:
-100 ≤ 9.1 ≤ 1500
IQR (CV) : 39 (3.8)
2182 distinct values 3244 (98.9%) 35 (1.1%)
35 HispanicPopChangeRate0010 [numeric]
Mean (sd) : 81.9 (91)
min ≤ med ≤ max:
-100 ≤ 66.4 ≤ 1740
IQR (CV) : 73.9 (1.1)
2947 distinct values 3267 (99.6%) 12 (0.4%)
36 MultipleRacePopChangeRate0010 [numeric]
Mean (sd) : 59.2 (57.2)
min ≤ med ≤ max:
-87.5 ≤ 53.5 ≤ 850
IQR (CV) : 58.8 (1)
2793 distinct values 3265 (99.6%) 14 (0.4%)
37 WhiteNonHispanicNum2010 [integer]
Mean (sd) : 180408 (3498477)
min ≤ med ≤ max:
24 ≤ 19996 ≤ 1.97e+08
IQR (CV) : 47308 (19.4)
3201 distinct values 3273 (99.8%) 6 (0.2%)
38 BlackNonHispanicNum2010 [integer]
Mean (sd) : 34544 (675236)
min ≤ med ≤ max:
0 ≤ 664 ≤ 37685848
IQR (CV) : 5466 (19.5)
1999 distinct values 3273 (99.8%) 6 (0.2%)
39 AsianNonHispanicNum2010 [integer]
Mean (sd) : 13260 (271334)
min ≤ med ≤ max:
0 ≤ 115 ≤ 14465124
IQR (CV) : 635 (20.5)
1251 distinct values 3273 (99.8%) 6 (0.2%)
40 NativeAmericanNonHispanicNum2010 [integer]
Mean (sd) : 2060 (40450)
min ≤ med ≤ max:
0 ≤ 100 ≤ 2247098
IQR (CV) : 367 (19.6)
1057 distinct values 3273 (99.8%) 6 (0.2%)
41 HispanicNum2010 [integer]
Mean (sd) : 47407 (944224)
min ≤ med ≤ max:
0 ≤ 1001 ≤ 50477594
IQR (CV) : 4884 (19.9)
2259 distinct values 3273 (99.8%) 6 (0.2%)
42 MultipleRaceNum2010 [integer]
Mean (sd) : 8295 (162882)
min ≤ med ≤ max:
1 ≤ 432 ≤ 9009073
IQR (CV) : 1357 (19.6)
1724 distinct values 3273 (99.8%) 6 (0.2%)
43 ForeignBornPct [numeric]
Mean (sd) : 4.8 (5.7)
min ≤ med ≤ max:
0 ≤ 2.8 ≤ 53.3
IQR (CV) : 4.5 (1.2)
3180 distinct values 3194 (97.4%) 85 (2.6%)
44 ForeignBornEuropePct [numeric]
Mean (sd) : 0.6 (0.7)
min ≤ med ≤ max:
0 ≤ 0.3 ≤ 7.7
IQR (CV) : 0.5 (1.3)
2991 distinct values 3194 (97.4%) 85 (2.6%)
45 ForeignBornMexPct [numeric]
Mean (sd) : 1.9 (3.5)
min ≤ med ≤ max:
0 ≤ 0.6 ≤ 39.5
IQR (CV) : 1.8 (1.8)
2920 distinct values 3194 (97.4%) 85 (2.6%)
46 NonEnglishHHPct [numeric]
Mean (sd) : 3.6 (11.3)
min ≤ med ≤ max:
0 ≤ 0.9 ≤ 89.5
IQR (CV) : 2 (3.2)
2986 distinct values 3273 (99.8%) 6 (0.2%)
47 Ed1LessThanHSPct [numeric]
Mean (sd) : 13.7 (6.6)
min ≤ med ≤ max:
1.2 ≤ 12.2 ≤ 66.3
IQR (CV) : 8.8 (0.5)
3270 distinct values 3273 (99.8%) 6 (0.2%)
48 Ed2HSDiplomaOnlyPct [numeric]
Mean (sd) : 34.1 (7.2)
min ≤ med ≤ max:
5.5 ≤ 34.3 ≤ 55.6
IQR (CV) : 9.5 (0.2)
3265 distinct values 3273 (99.8%) 6 (0.2%)
49 Ed3SomeCollegePct [numeric]
Mean (sd) : 21.6 (4.1)
min ≤ med ≤ max:
4.1 ≤ 21.6 ≤ 38.7
IQR (CV) : 5 (0.2)
3270 distinct values 3273 (99.8%) 6 (0.2%)
50 Ed4AssocDegreePct [numeric]
Mean (sd) : 8.9 (2.6)
min ≤ med ≤ max:
1.1 ≤ 8.7 ≤ 21.4
IQR (CV) : 3.4 (0.3)
3266 distinct values 3273 (99.8%) 6 (0.2%)
51 Ed5CollegePlusPct [numeric]
Mean (sd) : 21.7 (9.4)
min ≤ med ≤ max:
0 ≤ 19.5 ≤ 78.5
IQR (CV) : 10.8 (0.4)
3271 distinct values 3273 (99.8%) 6 (0.2%)
52 AvgHHSize [numeric]
Mean (sd) : 2.5 (0.3)
min ≤ med ≤ max:
1.3 ≤ 2.5 ≤ 5
IQR (CV) : 0.3 (0.1)
183 distinct values 3273 (99.8%) 6 (0.2%)
53 FemaleHHPct [numeric]
Mean (sd) : 11.4 (4.6)
min ≤ med ≤ max:
0 ≤ 10.6 ≤ 40.8
IQR (CV) : 5 (0.4)
3260 distinct values 3273 (99.8%) 6 (0.2%)
54 HH65PlusAlonePct [numeric]
Mean (sd) : 12.6 (3.1)
min ≤ med ≤ max:
2.8 ≤ 12.4 ≤ 31.8
IQR (CV) : 3.7 (0.2)
3258 distinct values 3273 (99.8%) 6 (0.2%)
55 OwnHomePct [numeric]
Mean (sd) : 71.3 (8.2)
min ≤ med ≤ max:
4.3 ≤ 72.4 ≤ 92.4
IQR (CV) : 9.5 (0.1)
3269 distinct values 3273 (99.8%) 6 (0.2%)
56 ForeignBornNum [integer]
Mean (sd) : 40884 (812006)
min ≤ med ≤ max:
0 ≤ 672 ≤ 43538827
IQR (CV) : 3000 (19.9)
2026 distinct values 3195 (97.4%) 84 (2.6%)
57 TotalPopACS [integer]
Mean (sd) : 297005 (5772068)
min ≤ med ≤ max:
75 ≤ 26719 ≤ 3.23e+08
IQR (CV) : 59948 (19.4)
3210 distinct values 3273 (99.8%) 6 (0.2%)
58 ForeignBornAfricaPct [numeric]
Mean (sd) : 2.8 (145)
min ≤ med ≤ max:
0 ≤ 0.1 ≤ 8182
IQR (CV) : 0.2 (52.3)
2101 distinct values 3195 (97.4%) 84 (2.6%)
59 Ed3SomeCollegeNum [integer]
Mean (sd) : 41363 (804615)
min ≤ med ≤ max:
12 ≤ 3753 ≤ 4.5e+07
IQR (CV) : 8834 (19.5)
2889 distinct values 3273 (99.8%) 6 (0.2%)
60 Ed2HSDiplomaOnlyNum [integer]
Mean (sd) : 54532 (1056336)
min ≤ med ≤ max:
15 ≤ 6578 ≤ 59265308
IQR (CV) : 12964 (19.4)
3020 distinct values 3273 (99.8%) 6 (0.2%)
61 Ed1LessThanHSNum [integer]
Mean (sd) : 24885 (486088)
min ≤ med ≤ max:
4 ≤ 2652 ≤ 26948057
IQR (CV) : 5234 (19.5)
2712 distinct values 3272 (99.8%) 7 (0.2%)
62 TotalPop25Plus [integer]
Mean (sd) : 200959 (3904200)
min ≤ med ≤ max:
69 ≤ 18362 ≤ 2.18e+08
IQR (CV) : 39924 (19.4)
3180 distinct values 3273 (99.8%) 6 (0.2%)
63 Ed5CollegePlusNum [integer]
Mean (sd) : 63309 (1232394)
min ≤ med ≤ max:
0 ≤ 3323 ≤ 68867051
IQR (CV) : 9897 (19.5)
2829 distinct values 3273 (99.8%) 6 (0.2%)
64 TotalOccHU [integer]
Mean (sd) : 110145 (2137025)
min ≤ med ≤ max:
33 ≤ 10202 ≤ 1.2e+08
IQR (CV) : 22737 (19.4)
3103 distinct values 3272 (99.8%) 7 (0.2%)
65 ForeignBornAsiaPct [numeric]
Mean (sd) : 1.7 (31.7)
min ≤ med ≤ max:
0 ≤ 0.5 ≤ 1791
IQR (CV) : 0.9 (19.2)
2954 distinct values 3195 (97.4%) 84 (2.6%)
66 Ed4AssocDegreeNum [integer]
Mean (sd) : 16886 (327446)
min ≤ med ≤ max:
7 ≤ 1593 ≤ 18338323
IQR (CV) : 3943 (19.4)
2501 distinct values 3272 (99.8%) 7 (0.2%)
67 ForeignBornEuropeNum [integer]
Mean (sd) : 4487 (87789)
min ≤ med ≤ max:
0 ≤ 83 ≤ 4778170
IQR (CV) : 367 (19.6)
1072 distinct values 3195 (97.4%) 84 (2.6%)
68 NonEnglishHHNum [integer]
Mean (sd) : 5134 (98171)
min ≤ med ≤ max:
0 ≤ 89 ≤ 5323080
IQR (CV) : 457 (19.1)
1155 distinct values 3273 (99.8%) 6 (0.2%)
69 HH65PlusAloneNum [integer]
Mean (sd) : 11843 (229373)
min ≤ med ≤ max:
4 ≤ 1266 ≤ 12868890
IQR (CV) : 2557 (19.4)
2351 distinct values 3273 (99.8%) 6 (0.2%)
70 OwnHomeNum [integer]
Mean (sd) : 70322 (1361695)
min ≤ med ≤ max:
2 ≤ 7314 ≤ 76444810
IQR (CV) : 15723 (19.4)
3061 distinct values 3273 (99.8%) 6 (0.2%)
71 FemaleHHNum [integer]
Mean (sd) : 13891 (269278)
min ≤ med ≤ max:
0 ≤ 1189 ≤ 15058180
IQR (CV) : 2778 (19.4)
2316 distinct values 3273 (99.8%) 6 (0.2%)
72 TotalHH [integer]
Mean (sd) : 110145 (2137025)
min ≤ med ≤ max:
33 ≤ 10202 ≤ 1.2e+08
IQR (CV) : 22737 (19.4)
3103 distinct values 3272 (99.8%) 7 (0.2%)
73 ForeignBornCentralSouthAmPct [numeric]
Mean (sd) : 2.5 (3.9)
min ≤ med ≤ max:
0 ≤ 1.1 ≤ 39.5
IQR (CV) : 2.4 (1.6)
3035 distinct values 3194 (97.4%) 85 (2.6%)
74 ForeignBornCentralSouthAmNum [integer]
Mean (sd) : 16796 (339763)
min ≤ med ≤ max:
0 ≤ 280 ≤ 17881956
IQR (CV) : 1440 (20.2)
1565 distinct values 3194 (97.4%) 85 (2.6%)
75 ForeignBornCaribPct [numeric]
Mean (sd) : 0.2 (1)
min ≤ med ≤ max:
0 ≤ 0 ≤ 32
IQR (CV) : 0.2 (4.2)
1951 distinct values 3194 (97.4%) 85 (2.6%)
76 ForeignBornCaribNum [integer]
Mean (sd) : 3993 (85882)
min ≤ med ≤ max:
0 ≤ 10 ≤ 4251037
IQR (CV) : 88 (21.5)
665 distinct values 3194 (97.4%) 85 (2.6%)
77 ForeignBornAfricaNum [integer]
Mean (sd) : 2021 (39190)
min ≤ med ≤ max:
0 ≤ 13 ≤ 2151409
IQR (CV) : 109 (19.4)
735 distinct values 3194 (97.4%) 85 (2.6%)
78 ForeignBornAsiaNum [integer]
Mean (sd) : 12585 (252757)
min ≤ med ≤ max:
0 ≤ 122 ≤ 13398520
IQR (CV) : 648 (20.1)
1256 distinct values 3194 (97.4%) 85 (2.6%)
79 ForeignBornMexNum [integer]
Mean (sd) : 10702 (222034)
min ≤ med ≤ max:
0 ≤ 186 ≤ 11394459
IQR (CV) : 928 (20.7)
1399 distinct values 3194 (97.4%) 85 (2.6%)
80 LandAreaSQMiles2010 [numeric]
Mean (sd) : 3240 (63263)
min ≤ med ≤ max:
2 ≤ 610 ≤ 3531905
IQR (CV) : 511 (19.5)
3235 distinct values 3273 (99.8%) 6 (0.2%)
81 Age65AndOlderNum2010 [integer]
Mean (sd) : 37087 (718626)
min ≤ med ≤ max:
12 ≤ 4112 ≤ 40267984
IQR (CV) : 8177 (19.4)
2908 distinct values 3273 (99.8%) 6 (0.2%)
82 TotalPop2010 [numeric]
Mean (sd) : 284130 (5517126)
min ≤ med ≤ max:
68.2 ≤ 26890 ≤ 3.09e+08
IQR (CV) : 58668 (19.4)
3220 distinct values 3273 (99.8%) 6 (0.2%)
83 Under18Num2010 [integer]
Mean (sd) : 68273 (1326626)
min ≤ med ≤ max:
0 ≤ 6359 ≤ 74181467
IQR (CV) : 13681 (19.4)
3046 distinct values 3273 (99.8%) 6 (0.2%)
84 Net_International_Migration_2000_2010 [numeric]
Mean (sd) : 5973 (51531)
min ≤ med ≤ max:
-4973 ≤ 137 ≤ 1906302
IQR (CV) : 704 (8.6)
1625 distinct values 3184 (97.1%) 95 (2.9%)
85 NaturalChangeNum0010 [numeric]
Mean (sd) : 10719 (82811)
min ≤ med ≤ max:
-27273 ≤ 491 ≤ 3097661
IQR (CV) : 2574 (7.7)
2616 distinct values 3184 (97.1%) 95 (2.9%)
86 NetMigrationNum0010 [numeric]
Mean (sd) : 5975 (70245)
min ≤ med ≤ max:
-838668 ≤ -62.5 ≤ 2114643
IQR (CV) : 2704 (11.8)
2835 distinct values 3184 (97.1%) 95 (2.9%)
87 TotalPopEst2012 [integer]
Mean (sd) : 288777 (5608652)
min ≤ med ≤ max:
86 ≤ 26901 ≤ 3.14e+08
IQR (CV) : 59167 (19.4)
3212 distinct values 3273 (99.8%) 6 (0.2%)
88 TotalPopEst2013 [integer]
Mean (sd) : 290747 (5647610)
min ≤ med ≤ max:
89 ≤ 26886 ≤ 3.16e+08
IQR (CV) : 59451 (19.4)
3208 distinct values 3273 (99.8%) 6 (0.2%)
89 TotalPopEst2010 [integer]
Mean (sd) : 284671 (5527443)
min ≤ med ≤ max:
84 ≤ 26854 ≤ 3.09e+08
IQR (CV) : 58777 (19.4)
3220 distinct values 3273 (99.8%) 6 (0.2%)
90 TotalPopEst2014 [integer]
Mean (sd) : 292843 (5689251)
min ≤ med ≤ max:
89 ≤ 26944 ≤ 3.18e+08
IQR (CV) : 59489 (19.4)
3208 distinct values 3273 (99.8%) 6 (0.2%)
91 TotalPopEst2011 [integer]
Mean (sd) : 286706 (5567688)
min ≤ med ≤ max:
90 ≤ 26826 ≤ 3.12e+08
IQR (CV) : 59129 (19.4)
3211 distinct values 3273 (99.8%) 6 (0.2%)
92 Net_International_Migration_2010_2019 [integer]
Mean (sd) : 7219 (141089)
min ≤ med ≤ max:
-1451 ≤ 81 ≤ 7685444
IQR (CV) : 503 (19.5)
1261 distinct values 3194 (97.4%) 85 (2.6%)
93 NaturalChange1019 [integer]
Mean (sd) : 10550 (206980)
min ≤ med ≤ max:
-30357 ≤ 83.5 ≤ 11232413
IQR (CV) : 1510 (19.6)
2134 distinct values 3194 (97.4%) 85 (2.6%)
94 TotalPopEst2015 [integer]
Mean (sd) : 294964 (5731377)
min ≤ med ≤ max:
88 ≤ 26827 ≤ 3.21e+08
IQR (CV) : 59646 (19.4)
3204 distinct values 3273 (99.8%) 6 (0.2%)
95 TotalPopEst2016 [integer]
Mean (sd) : 297057 (5772826)
min ≤ med ≤ max:
88 ≤ 26626 ≤ 3.23e+08
IQR (CV) : 59973 (19.4)
3213 distinct values 3273 (99.8%) 6 (0.2%)
96 TotalPopEst2017 [integer]
Mean (sd) : 298905 (5809474)
min ≤ med ≤ max:
86 ≤ 26676 ≤ 3.25e+08
IQR (CV) : 60010 (19.4)
3207 distinct values 3273 (99.8%) 6 (0.2%)
97 NetMigration1019 [integer]
Mean (sd) : 7219 (150698)
min ≤ med ≤ max:
-678325 ≤ -146 ≤ 7685444
IQR (CV) : 1830 (20.9)
2476 distinct values 3194 (97.4%) 85 (2.6%)
98 TotalPopEst2018 [integer]
Mean (sd) : 3e+05 (5839861)
min ≤ med ≤ max:
86 ≤ 26702 ≤ 3.27e+08
IQR (CV) : 59824 (19.4)
3219 distinct values 3273 (99.8%) 6 (0.2%)
99 TotalPopEstBase2010 [integer]
Mean (sd) : 284230 (5518191)
min ≤ med ≤ max:
82 ≤ 26908 ≤ 3.09e+08
IQR (CV) : 58771 (19.4)
3207 distinct values 3272 (99.8%) 7 (0.2%)
100 UnempRate2019 [numeric]
Mean (sd) : 4.1 (1.8)
min ≤ med ≤ max:
0.7 ≤ 3.7 ≤ 19.3
IQR (CV) : 1.7 (0.4)
123 distinct values 3272 (99.8%) 7 (0.2%)
101 UnempRate2018 [numeric]
Mean (sd) : 4.3 (1.9)
min ≤ med ≤ max:
1.3 ≤ 3.9 ≤ 19.6
IQR (CV) : 1.8 (0.4)
131 distinct values 3272 (99.8%) 7 (0.2%)
102 UnempRate2017 [numeric]
Mean (sd) : 4.8 (2.2)
min ≤ med ≤ max:
1.6 ≤ 4.4 ≤ 20.6
IQR (CV) : 1.9 (0.5)
145 distinct values 3272 (99.8%) 7 (0.2%)
103 UnempRate2016 [numeric]
Mean (sd) : 5.4 (2.4)
min ≤ med ≤ max:
1.7 ≤ 5 ≤ 24.1
IQR (CV) : 2.2 (0.4)
150 distinct values 3272 (99.8%) 7 (0.2%)
104 UnempRate2015 [numeric]
Mean (sd) : 5.7 (2.5)
min ≤ med ≤ max:
1.8 ≤ 5.3 ≤ 24.5
IQR (CV) : 2.4 (0.4)
152 distinct values 3272 (99.8%) 7 (0.2%)
105 UnempRate2014 [numeric]
Mean (sd) : 6.5 (2.9)
min ≤ med ≤ max:
1.2 ≤ 6.1 ≤ 26.4
IQR (CV) : 2.9 (0.4)
178 distinct values 3272 (99.8%) 7 (0.2%)
106 UnempRate2010 [numeric]
Mean (sd) : 9.6 (3.5)
min ≤ med ≤ max:
2.1 ≤ 9.3 ≤ 28.8
IQR (CV) : 4.2 (0.4)
203 distinct values 3272 (99.8%) 7 (0.2%)
107 UnempRate2007 [numeric]
Mean (sd) : 5.1 (2.1)
min ≤ med ≤ max:
1.5 ≤ 4.7 ≤ 20.4
IQR (CV) : 2.1 (0.4)
138 distinct values 3267 (99.6%) 12 (0.4%)
108 PctEmpChange1019 [numeric]
Mean (sd) : 4.9 (17.5)
min ≤ med ≤ max:
-34.1 ≤ 3.9 ≤ 703
IQR (CV) : 13.7 (3.6)
2221 distinct values 3272 (99.8%) 7 (0.2%)
109 PctEmpChange1819 [numeric]
Mean (sd) : 0.8 (3)
min ≤ med ≤ max:
-28.8 ≤ 0.8 ≤ 104
IQR (CV) : 2.1 (3.8)
834 distinct values 3272 (99.8%) 7 (0.2%)
110 PctEmpChange0719 [numeric]
Mean (sd) : -0.3 (28.7)
min ≤ med ≤ max:
-59.2 ≤ -1.7 ≤ 1316
IQR (CV) : 17.6 (-104)
2395 distinct values 3267 (99.6%) 12 (0.4%)
111 PctEmpChange0710 [numeric]
Mean (sd) : -5.2 (10.3)
min ≤ med ≤ max:
-55 ≤ -5.3 ≤ 91.6
IQR (CV) : 10.1 (-2)
2053 distinct values 3267 (99.6%) 12 (0.4%)
112 PctEmpAgriculture [numeric]
Mean (sd) : 5 (6.2)
min ≤ med ≤ max:
0 ≤ 2.8 ≤ 59.6
IQR (CV) : 5 (1.3)
3256 distinct values 3272 (99.8%) 7 (0.2%)
113 PctEmpMining [numeric]
Mean (sd) : 1.5 (3.4)
min ≤ med ≤ max:
0 ≤ 0.3 ≤ 44
IQR (CV) : 1.2 (2.2)
2885 distinct values 3272 (99.8%) 7 (0.2%)
114 PctEmpConstruction [numeric]
Mean (sd) : 7.3 (2.4)
min ≤ med ≤ max:
0 ≤ 7 ≤ 25.5
IQR (CV) : 2.7 (0.3)
3261 distinct values 3272 (99.8%) 7 (0.2%)
115 PctEmpManufacturing [numeric]
Mean (sd) : 12.2 (7.1)
min ≤ med ≤ max:
0 ≤ 11.3 ≤ 51.7
IQR (CV) : 9.7 (0.6)
3260 distinct values 3272 (99.8%) 7 (0.2%)
116 PctEmpTrade [numeric]
Mean (sd) : 13.7 (2.7)
min ≤ med ≤ max:
0.8 ≤ 13.9 ≤ 38.9
IQR (CV) : 3 (0.2)
3265 distinct values 3272 (99.8%) 7 (0.2%)
117 PctEmpTrans [numeric]
Mean (sd) : 5.5 (2.1)
min ≤ med ≤ max:
0 ≤ 5.2 ≤ 27.7
IQR (CV) : 2.4 (0.4)
3259 distinct values 3272 (99.8%) 7 (0.2%)
118 PctEmpInformation [numeric]
Mean (sd) : 1.4 (0.8)
min ≤ med ≤ max:
0 ≤ 1.3 ≤ 12.3
IQR (CV) : 0.9 (0.6)
3175 distinct values 3272 (99.8%) 7 (0.2%)
119 PctEmpFIRE [numeric]
Mean (sd) : 4.6 (1.9)
min ≤ med ≤ max:
0 ≤ 4.3 ≤ 20.6
IQR (CV) : 2.2 (0.4)
3249 distinct values 3272 (99.8%) 7 (0.2%)
120 PctEmpServices [numeric]
Mean (sd) : 43.1 (7)
min ≤ med ≤ max:
8.3 ≤ 43 ≤ 81.6
IQR (CV) : 9.3 (0.2)
3269 distinct values 3272 (99.8%) 7 (0.2%)
121 PctEmpGovt [numeric]
Mean (sd) : 5.7 (3.3)
min ≤ med ≤ max:
0 ≤ 4.8 ≤ 38.6
IQR (CV) : 3.2 (0.6)
3266 distinct values 3272 (99.8%) 7 (0.2%)
122 NumCivEmployed [integer]
Mean (sd) : 140356 (2730225)
min ≤ med ≤ max:
36 ≤ 10897 ≤ 1.53e+08
IQR (CV) : 26528 (19.5)
3118 distinct values 3272 (99.8%) 7 (0.2%)
123 UnempRate2011 [numeric]
Mean (sd) : 9 (3.4)
min ≤ med ≤ max:
1.4 ≤ 8.7 ≤ 28.9
IQR (CV) : 4 (0.4)
200 distinct values 3272 (99.8%) 7 (0.2%)
124 NumCivLaborForce2011 [integer]
Mean (sd) : 142130 (2762563)
min ≤ med ≤ max:
66 ≤ 12016 ≤ 1.55e+08
IQR (CV) : 28183 (19.4)
3123 distinct values 3272 (99.8%) 7 (0.2%)
125 NumEmployed2011 [integer]
Mean (sd) : 129380 (2514380)
min ≤ med ≤ max:
62 ≤ 10836 ≤ 1.41e+08
IQR (CV) : 25612 (19.4)
3120 distinct values 3272 (99.8%) 7 (0.2%)
126 NumCivLaborForce2012 [integer]
Mean (sd) : 142593 (2772051)
min ≤ med ≤ max:
67 ≤ 11960 ≤ 1.55e+08
IQR (CV) : 27825 (19.4)
3141 distinct values 3272 (99.8%) 7 (0.2%)
127 NumUnemployed2010 [integer]
Mean (sd) : 13691 (266615)
min ≤ med ≤ max:
4 ≤ 1258 ≤ 14862528
IQR (CV) : 2804 (19.5)
2325 distinct values 3272 (99.8%) 7 (0.2%)
128 NumCivLaborForce2008 [integer]
Mean (sd) : 141610 (2748841)
min ≤ med ≤ max:
43 ≤ 12504 ≤ 1.54e+08
IQR (CV) : 28021 (19.4)
3140 distinct values 3267 (99.6%) 12 (0.4%)
129 NumUnemployed2011 [integer]
Mean (sd) : 12750 (248454)
min ≤ med ≤ max:
4 ≤ 1174 ≤ 13840502
IQR (CV) : 2558 (19.5)
2276 distinct values 3272 (99.8%) 7 (0.2%)
130 NumEmployed2010 [integer]
Mean (sd) : 128146 (2489980)
min ≤ med ≤ max:
67 ≤ 10868 ≤ 1.39e+08
IQR (CV) : 25460 (19.4)
3130 distinct values 3272 (99.8%) 7 (0.2%)
131 NumCivLaborForce2010 [integer]
Mean (sd) : 141838 (2756335)
min ≤ med ≤ max:
71 ≤ 12098 ≤ 1.54e+08
IQR (CV) : 28236 (19.4)
3144 distinct values 3272 (99.8%) 7 (0.2%)
132 NumUnemployed2009 [integer]
Mean (sd) : 13129 (255206)
min ≤ med ≤ max:
4 ≤ 1228 ≤ 14230942
IQR (CV) : 2736 (19.4)
2338 distinct values 3267 (99.6%) 12 (0.4%)
133 NumEmployed2009 [integer]
Mean (sd) : 128526 (2495118)
min ≤ med ≤ max:
39 ≤ 11311 ≤ 1.4e+08
IQR (CV) : 25617 (19.4)
3128 distinct values 3267 (99.6%) 12 (0.4%)
134 NumCivLaborForce2009 [integer]
Mean (sd) : 141655 (2750109)
min ≤ med ≤ max:
43 ≤ 12507 ≤ 1.54e+08
IQR (CV) : 28014 (19.4)
3151 distinct values 3267 (99.6%) 12 (0.4%)
135 UnempRate2008 [numeric]
Mean (sd) : 6 (2.4)
min ≤ med ≤ max:
1.3 ≤ 5.7 ≤ 22.6
IQR (CV) : 2.7 (0.4)
148 distinct values 3267 (99.6%) 12 (0.4%)
136 UnempRate2012 [numeric]
Mean (sd) : 8.1 (3.2)
min ≤ med ≤ max:
1.1 ≤ 7.8 ≤ 27.4
IQR (CV) : 3.7 (0.4)
189 distinct values 3272 (99.8%) 7 (0.2%)
137 NumEmployed2008 [integer]
Mean (sd) : 133388 (2589264)
min ≤ med ≤ max:
40 ≤ 11701 ≤ 1.45e+08
IQR (CV) : 26457 (19.4)
3106 distinct values 3267 (99.6%) 12 (0.4%)
138 UnempRate2009 [numeric]
Mean (sd) : 9.3 (3.5)
min ≤ med ≤ max:
2.1 ≤ 8.8 ≤ 27.4
IQR (CV) : 4.2 (0.4)
195 distinct values 3267 (99.6%) 12 (0.4%)
139 NumUnemployed2008 [integer]
Mean (sd) : 8222 (159719)
min ≤ med ≤ max:
3 ≤ 788 ≤ 8900776
IQR (CV) : 1747 (19.4)
2043 distinct values 3267 (99.6%) 12 (0.4%)
140 NumUnemployed2015 [integer]
Mean (sd) : 7637 (148385)
min ≤ med ≤ max:
4 ≤ 708 ≤ 8283796
IQR (CV) : 1582 (19.4)
1956 distinct values 3272 (99.8%) 7 (0.2%)
141 NumUnemployed2019 [integer]
Mean (sd) : 5515 (107123)
min ≤ med ≤ max:
4 ≤ 499 ≤ 5984808
IQR (CV) : 1126 (19.4)
1740 distinct values 3272 (99.8%) 7 (0.2%)
142 NumCivLaborforce2019 [integer]
Mean (sd) : 149874 (2915158)
min ≤ med ≤ max:
223 ≤ 11809 ≤ 1.63e+08
IQR (CV) : 28515 (19.5)
3140 distinct values 3272 (99.8%) 7 (0.2%)
143 NumUnemployed2018 [integer]
Mean (sd) : 5795 (112551)
min ≤ med ≤ max:
4 ≤ 512 ≤ 6286707
IQR (CV) : 1166 (19.4)
1773 distinct values 3272 (99.8%) 7 (0.2%)
144 NumEmployed2018 [integer]
Mean (sd) : 142510 (2772143)
min ≤ med ≤ max:
205 ≤ 11216 ≤ 1.55e+08
IQR (CV) : 27049 (19.5)
3133 distinct values 3272 (99.8%) 7 (0.2%)
145 NumCivLaborforce2018 [integer]
Mean (sd) : 148305 (2884660)
min ≤ med ≤ max:
211 ≤ 11777 ≤ 1.61e+08
IQR (CV) : 28284 (19.5)
3137 distinct values 3272 (99.8%) 7 (0.2%)
146 NumUnemployed2017 [integer]
Mean (sd) : 6432 (124919)
min ≤ med ≤ max:
5 ≤ 570 ≤ 6975103
IQR (CV) : 1334 (19.4)
1840 distinct values 3272 (99.8%) 7 (0.2%)
147 NumEmployed2017 [integer]
Mean (sd) : 140751 (2737799)
min ≤ med ≤ max:
95 ≤ 11135 ≤ 1.53e+08
IQR (CV) : 26906 (19.5)
3124 distinct values 3272 (99.8%) 7 (0.2%)
148 NumCivLaborforce2017 [integer]
Mean (sd) : 147183 (2862681)
min ≤ med ≤ max:
100 ≤ 11735 ≤ 1.6e+08
IQR (CV) : 28049 (19.4)
3146 distinct values 3272 (99.8%) 7 (0.2%)
149 NumUnemployed2016 [integer]
Mean (sd) : 7122 (138304)
min ≤ med ≤ max:
4 ≤ 660 ≤ 7723517
IQR (CV) : 1502 (19.4)
1938 distinct values 3272 (99.8%) 7 (0.2%)
150 NumEmployed2016 [integer]
Mean (sd) : 138662 (2696926)
min ≤ med ≤ max:
82 ≤ 11035 ≤ 1.51e+08
IQR (CV) : 26506 (19.4)
3116 distinct values 3272 (99.8%) 7 (0.2%)
151 NumCivLaborforce2016 [integer]
Mean (sd) : 145784 (2835184)
min ≤ med ≤ max:
86 ≤ 11670 ≤ 1.59e+08
IQR (CV) : 28034 (19.4)
3112 distinct values 3272 (99.8%) 7 (0.2%)
152 NumCivLaborforce2013 [integer]
Mean (sd) : 142920 (2778954)
min ≤ med ≤ max:
75 ≤ 11868 ≤ 1.55e+08
IQR (CV) : 27642 (19.4)
3131 distinct values 3272 (99.8%) 7 (0.2%)
153 NumEmployed2015 [integer]
Mean (sd) : 136473 (2654156)
min ≤ med ≤ max:
73 ≤ 10904 ≤ 1.49e+08
IQR (CV) : 26088 (19.4)
3117 distinct values 3272 (99.8%) 7 (0.2%)
154 NumEmployed2012 [integer]
Mean (sd) : 131061 (2547581)
min ≤ med ≤ max:
63 ≤ 10944 ≤ 1.43e+08
IQR (CV) : 25719 (19.4)
3129 distinct values 3272 (99.8%) 7 (0.2%)
155 NumUnemployed2014 [integer]
Mean (sd) : 8868 (172446)
min ≤ med ≤ max:
4 ≤ 816 ≤ 9618987
IQR (CV) : 1809 (19.4)
2040 distinct values 3272 (99.8%) 7 (0.2%)
156 NumEmployed2014 [integer]
Mean (sd) : 134474 (2615176)
min ≤ med ≤ max:
75 ≤ 10842 ≤ 1.46e+08
IQR (CV) : 26012 (19.4)
3113 distinct values 3272 (99.8%) 7 (0.2%)
157 NumCivLaborforce2014 [integer]
Mean (sd) : 143342 (2787500)
min ≤ med ≤ max:
79 ≤ 11694 ≤ 1.56e+08
IQR (CV) : 27584 (19.4)
3142 distinct values 3272 (99.8%) 7 (0.2%)
158 UnempRate2013 [numeric]
Mean (sd) : 7.6 (3.1)
min ≤ med ≤ max:
1.2 ≤ 7.3 ≤ 27.4
IQR (CV) : 3.6 (0.4)
196 distinct values 3272 (99.8%) 7 (0.2%)
159 NumUnemployed2013 [integer]
Mean (sd) : 10566 (205637)
min ≤ med ≤ max:
4 ≤ 964 ≤ 11467539
IQR (CV) : 2118 (19.5)
2154 distinct values 3272 (99.8%) 7 (0.2%)
160 NumEmployed2019 [integer]
Mean (sd) : 144359 (2808078)
min ≤ med ≤ max:
212 ≤ 11277 ≤ 1.57e+08
IQR (CV) : 27347 (19.5)
3107 distinct values 3272 (99.8%) 7 (0.2%)
161 NumEmployed2013 [integer]
Mean (sd) : 132354 (2573464)
min ≤ med ≤ max:
71 ≤ 10807 ≤ 1.44e+08
IQR (CV) : 25757 (19.4)
3130 distinct values 3272 (99.8%) 7 (0.2%)
162 NumUnemployed2007 [integer]
Mean (sd) : 6508 (126039)
min ≤ med ≤ max:
3 ≤ 650 ≤ 7035039
IQR (CV) : 1462 (19.4)
1909 distinct values 3267 (99.6%) 12 (0.4%)
163 NumEmployed2007 [integer]
Mean (sd) : 133673 (2594574)
min ≤ med ≤ max:
38 ≤ 11744 ≤ 1.45e+08
IQR (CV) : 26580 (19.4)
3103 distinct values 3267 (99.6%) 12 (0.4%)
164 NumCivLaborforce2007 [integer]
Mean (sd) : 140181 (2720538)
min ≤ med ≤ max:
41 ≤ 12420 ≤ 1.52e+08
IQR (CV) : 27822 (19.4)
3132 distinct values 3267 (99.6%) 12 (0.4%)
165 NumUnemployed2012 [integer]
Mean (sd) : 11532 (224694)
min ≤ med ≤ max:
4 ≤ 1044 ≤ 12518797
IQR (CV) : 2281 (19.5)
2221 distinct values 3272 (99.8%) 7 (0.2%)
166 NumCivLaborforce2015 [integer]
Mean (sd) : 144110 (2802464)
min ≤ med ≤ max:
77 ≤ 11598 ≤ 1.57e+08
IQR (CV) : 27728 (19.4)
3119 distinct values 3272 (99.8%) 7 (0.2%)
167 RuralUrbanContinuumCode2013 [integer]
Mean (sd) : 4.9 (2.7)
min ≤ med ≤ max:
1 ≤ 6 ≤ 9
IQR (CV) : 5 (0.6)
1:472(14.6%)
2:396(12.3%)
3:369(11.5%)
4:217(6.7%)
5:92(2.9%)
6:597(18.5%)
7:434(13.5%)
8:220(6.8%)
9:425(13.2%)
3222 (98.3%) 57 (1.7%)
168 UrbanInfluenceCode2013 [integer]
Mean (sd) : 5.2 (3.5)
min ≤ med ≤ max:
1 ≤ 5 ≤ 12
IQR (CV) : 6 (0.7)
12 distinct values 3222 (98.3%) 57 (1.7%)
169 RuralUrbanContinuumCode2003 [integer]
Mean (sd) : 5.1 (2.7)
min ≤ med ≤ max:
1 ≤ 6 ≤ 9
IQR (CV) : 4 (0.5)
1:455(14.1%)
2:336(10.4%)
3:368(11.4%)
4:221(6.9%)
5:105(3.3%)
6:613(19.0%)
7:453(14.0%)
8:235(7.3%)
9:439(13.6%)
3225 (98.4%) 54 (1.6%)
170 UrbanInfluenceCode2003 [integer]
Mean (sd) : 5.4 (3.5)
min ≤ med ≤ max:
1 ≤ 5 ≤ 12
IQR (CV) : 6 (0.6)
12 distinct values 3225 (98.4%) 54 (1.6%)
171 Metro2013 [integer]
Min : 0
Mean : 0.4
Max : 1
0:1985(61.6%)
1:1237(38.4%)
3222 (98.3%) 57 (1.7%)
172 Nonmetro2013 [integer]
Min : 0
Mean : 0.6
Max : 1
0:1237(38.4%)
1:1985(61.6%)
3222 (98.3%) 57 (1.7%)
173 Micropolitan2013 [integer]
Min : 0
Mean : 0.2
Max : 1
0:2576(80.0%)
1:646(20.0%)
3222 (98.3%) 57 (1.7%)
174 Type_2015_Update [integer]
Mean (sd) : 1.8 (1.8)
min ≤ med ≤ max:
0 ≤ 1 ≤ 5
IQR (CV) : 3 (1)
0:1237(39.4%)
1:444(14.1%)
2:221(7.0%)
3:501(15.9%)
4:407(12.9%)
5:333(10.6%)
3143 (95.9%) 136 (4.1%)
175 Type_2015_Farming_NO [integer]
Min : 0
Mean : 0.1
Max : 1
0:2699(85.9%)
1:444(14.1%)
3143 (95.9%) 136 (4.1%)
176 Type_2015_Manufacturing_NO [integer]
Min : 0
Mean : 0.2
Max : 1
0:2642(84.1%)
1:501(15.9%)
3143 (95.9%) 136 (4.1%)
177 Type_2015_Mining_NO [integer]
Min : 0
Mean : 0.1
Max : 1
0:2922(93.0%)
1:221(7.0%)
3143 (95.9%) 136 (4.1%)
178 Type_2015_Government_NO [integer]
Min : 0
Mean : 0.1
Max : 1
0:2736(87.1%)
1:407(12.9%)
3143 (95.9%) 136 (4.1%)
179 Type_2015_Recreation_NO [integer]
Min : 0
Mean : 0.1
Max : 1
0:2810(89.4%)
1:333(10.6%)
3143 (95.9%) 136 (4.1%)
180 Low_Education_2015_update [integer]
Min : 0
Mean : 0.1
Max : 1
0:2676(85.1%)
1:467(14.9%)
3143 (95.9%) 136 (4.1%)
181 Low_Employment_2015_update [integer]
Min : 0
Mean : 0.3
Max : 1
0:2237(71.2%)
1:906(28.8%)
3143 (95.9%) 136 (4.1%)
182 Population_loss_2015_update [integer]
Min : 0
Mean : 0.2
Max : 1
0:2614(83.2%)
1:529(16.8%)
3143 (95.9%) 136 (4.1%)
183 Retirement_Destination_2015_Update [integer]
Min : 0
Mean : 0.1
Max : 1
0:2701(85.9%)
1:442(14.1%)
3143 (95.9%) 136 (4.1%)
184 Perpov_1980_0711 [integer]
Min : 0
Mean : 0.1
Max : 1
0:2790(88.8%)
1:353(11.2%)
3143 (95.9%) 136 (4.1%)
185 PersistentChildPoverty_1980_2011 [integer]
Min : 0
Mean : 0.2
Max : 1
0:2435(77.5%)
1:708(22.5%)
3143 (95.9%) 136 (4.1%)
186 Hipov [integer]
Mean (sd) : 0.2 (0.5)
min ≤ med ≤ max:
-9 ≤ 0 ≤ 1
IQR (CV) : 0 (2)
-9:1(0.0%)
0:2482(77.1%)
1:738(22.9%)
3221 (98.2%) 58 (1.8%)
187 HiAmenity [integer]
Min : 0
Mean : 0.3
Max : 1
0:2330(75.0%)
1:777(25.0%)
3107 (94.8%) 172 (5.2%)
188 HiCreativeClass2000 [integer]
Min : 0
Mean : 0.3
Max : 1
0:2353(75.0%)
1:786(25.0%)
3139 (95.7%) 140 (4.3%)
189 Gas_Change [integer]
Mean (sd) : 0.6 (2)
min ≤ med ≤ max:
0 ≤ 0 ≤ 9
IQR (CV) : 0 (3.5)
0:2780(89.4%)
2:174(5.6%)
9:155(5.0%)
3109 (94.8%) 170 (5.2%)
190 Oil_Change [integer]
Mean (sd) : 0.4 (1.8)
min ≤ med ≤ max:
0 ≤ 0 ≤ 9
IQR (CV) : 0 (4.2)
0:2880(92.6%)
2:107(3.4%)
9:122(3.9%)
3109 (94.8%) 170 (5.2%)
191 Oil_Gas_Change [integer]
Mean (sd) : 0.8 (2.3)
min ≤ med ≤ max:
0 ≤ 0 ≤ 9
IQR (CV) : 0 (3)
0:2679(86.2%)
2:218(7.0%)
9:212(6.8%)
3109 (94.8%) 170 (5.2%)
192 Metro2003 [integer]
Min : 0
Mean : 0.4
Max : 1
0:2066(64.1%)
1:1159(35.9%)
3225 (98.4%) 54 (1.6%)
193 NonmetroNotAdj2003 [integer]
Min : 0
Mean : 0.3
Max : 1
0:2228(69.1%)
1:997(30.9%)
3225 (98.4%) 54 (1.6%)
194 NonmetroAdj2003 [integer]
Min : 0
Mean : 0.3
Max : 1
0:2155(66.8%)
1:1070(33.2%)
3225 (98.4%) 54 (1.6%)
195 Noncore2003 [integer]
Min : 0
Mean : 0.4
Max : 1
0:1839(57.0%)
1:1386(43.0%)
3225 (98.4%) 54 (1.6%)
196 EconomicDependence2000 [integer]
Mean (sd) : 3.9 (1.7)
min ≤ med ≤ max:
1 ≤ 4 ≤ 6
IQR (CV) : 3 (0.4)
1:440(14.0%)
2:128(4.1%)
3:904(28.8%)
4:381(12.1%)
5:340(10.8%)
6:948(30.2%)
3141 (95.8%) 138 (4.2%)
197 Nonmetro2003 [integer]
Min : 0
Mean : 0.6
Max : 1
0:1159(35.9%)
1:2066(64.1%)
3225 (98.4%) 54 (1.6%)
198 Micropolitan2003 [integer]
Min : 0
Mean : 0.2
Max : 1
0:2545(78.9%)
1:680(21.1%)
3225 (98.4%) 54 (1.6%)
199 FarmDependent2003 [integer]
Min : 0
Mean : 0.1
Max : 1
0:2701(86.0%)
1:440(14.0%)
3141 (95.8%) 138 (4.2%)
200 ManufacturingDependent2000 [integer]
Min : 0
Mean : 0.3
Max : 1
0:2237(71.2%)
1:904(28.8%)
3141 (95.8%) 138 (4.2%)
201 LowEducation2000 [integer]
Min : 0
Mean : 0.2
Max : 1
0:2519(80.2%)
1:622(19.8%)
3141 (95.8%) 138 (4.2%)
202 RetirementDestination2000 [integer]
Min : 0
Mean : 0.1
Max : 1
0:2701(86.0%)
1:440(14.0%)
3141 (95.8%) 138 (4.2%)
203 PersistentPoverty2000 [integer]
Min : 0
Mean : 0.1
Max : 1
0:2755(87.7%)
1:386(12.3%)
3141 (95.8%) 138 (4.2%)
204 Noncore2013 [integer]
Min : 0
Mean : 0.4
Max : 1
0:1883(58.4%)
1:1339(41.6%)
3222 (98.3%) 57 (1.7%)
205 Type_2015_Nonspecialized_NO [integer]
Min : 0
Mean : 0.4
Max : 1
0:1906(60.6%)
1:1237(39.4%)
3143 (95.9%) 136 (4.1%)
206 Metro_Adjacent2013 [integer]
Min : 0
Mean : 0.3
Max : 1
0:2194(68.1%)
1:1028(31.9%)
3222 (98.3%) 57 (1.7%)
207 PersistentChildPoverty2004 [integer]
Min : 0
Mean : 0.2
Max : 1
0:2407(76.6%)
1:734(23.4%)
3141 (95.8%) 138 (4.2%)
208 RecreationDependent2000 [integer]
Min : 0
Mean : 0.1
Max : 1
0:2807(89.4%)
1:334(10.6%)
3141 (95.8%) 138 (4.2%)

Generated by summarytools 1.0.0 (R version 4.1.2)
2022-02-27

Covid_rate data set has 10008984 observations and 8 variables; it is a wide panel data with a time interval from 2020-01-21 to 2021-02-20. County_data data set has 3279 observations and 208 variables. It is a cross-sectional data with 3279 counties. The missing values are not prevalent in these two data sets.

2.2 covid case trend

2.2.1 three states, by day and state

# unique(covid_rate$State)
covid.3state.day = covid_rate[State %in% c("New York","Washington","Florida"),.(
  cum_cases = sum(cum_cases),
  cum_deaths = sum(cum_deaths),
  week = mean(week)
),by = .(date,State)]
covid.3state.day = covid.3state.day[order(list(State,date))]

covid.3state.day[,new_cases := c(NA,diff(cum_cases)),by = State]
covid.3state.day[,date := ymd(date)]
ggplot(covid.3state.day,aes(x=date,y=new_cases,group=State,color=State))+
  geom_line()+
  scale_x_date(date_breaks = "3 month",date_labels = "%Y-%m")+
  labs(title = "New Cases Trends for NY, WA and FL")+
  theme_bw()

The biggest problem: daily variability is extremely high, making the trends zigzagging. Maybe there is an innate difference among days in a week because people have different work and life style in different days. We may wish to smooth out these excessive noise by aggregating in the weekly level.

2.2.2 Spaghetti Plots

# check
covid_rate[State == "New York",length(County),by = .(State,date)] # problem: some counties do not report cases in some days. New added counties cause problems
##         State       date V1
##   1: New York 2020-03-12 17
##   2: New York 2020-03-13 18
##   3: New York 2020-03-14 21
##   4: New York 2020-03-15 25
##   5: New York 2020-03-16 28
##  ---                       
## 354: New York 2020-03-07  9
## 355: New York 2020-03-08 11
## 356: New York 2020-03-09 11
## 357: New York 2020-03-10 11
## 358: New York 2020-03-11 12

We find that the number of counties reporting cases every day is increasing. We may want to adjust the total population accordingly.

covid.weekend = covid_rate[,.(
  cum_cases_weekend = cum_cases[length(cum_cases)],
  cum_deaths_weekend = cum_deaths[length(cum_deaths)],
  TotalPopEst2019.weekend = TotalPopEst2019[length(TotalPopEst2019)]
), by = .(week,State,County)]
covid.weekend = covid.weekend[order(list(State,County,week))]
covid.weekend[,":="(new_cases = c(NA,diff(cum_cases_weekend)),
                                 new_deaths = c(NA,diff(cum_deaths_weekend))),
                           by = .(State,County)]
covid.week = covid.weekend[!is.na(new_cases)&!is.na(new_deaths),.(
  new_cases = sum(new_cases),
  new_deaths = sum(new_deaths),
  TotalPopEst2019 = sum(TotalPopEst2019.weekend)
),by = .(State,week)]
covid.week[,weekly_case_per100k := new_cases/TotalPopEst2019*100000]

# data.ma = covid_rate[State == "Massachusetts"] # pay attention: week 33 of MA

ggplot(covid.week,aes(x=week,y=weekly_case_per100k,group=State,color=State))+
  geom_line()+
  labs(title = "Spaghetti Plots, Weekly New Cases")+
  theme_bw()

2.2.3 Summary

There are two possibilities to explain this. Firstly, many states, such as those in the Northeastern part of California, have densely populated urban centers. COVID tends to spread a lot quicker in these areas. Secondly, some states imposed stricter lockdown measures and mask-mandates early on to combat the COVID pandemic. These initiatives likely slowed the spread of COVID.

2.2.4 Effectiveness of lockdown by graphs

# summary(covid_intervention)
# some examples

covid_intervention[STATE == "New York"]
## Empty data.table (0 rows and 16 cols): FIPS,STATE,AREA_NAME,stay at home,>50 gatherings,>500 gatherings...
covid_intervention_state = covid_intervention[substr(FIPS,nchar(FIPS)-2,nchar(FIPS)) == "000"]

# NY
ggplot(covid.3state.day[State == "New York"],aes(x=date,y=new_cases))+
  geom_line()+
  scale_x_date(date_breaks = "3 month",date_labels = "%Y-%m")+
  geom_vline(xintercept = covid_intervention_state[STATE == "NY","stay at home"][[1,1]],color = "red",lty = 5)+
  labs(title = "New Cases Trends for NY")+
  theme_bw()

# FL
ggplot(covid.3state.day[State == "Florida"],aes(x=date,y=new_cases))+
  geom_line()+
  scale_x_date(date_breaks = "3 month",date_labels = "%Y-%m")+
  geom_vline(xintercept = covid_intervention_state[STATE == "FL","stay at home"][[1,1]],color = "red",lty = 5)+
  labs(title = "New Cases Trends for FL")+
  theme_bw()

The graphs for these two states indicate that the lockdown policy may have some positive effects on slowing down the spread of the virus.

2.3 covid death trend

2.3.1 Monthly deaths per 100k heatmap

# pay attention to numbers smaller than zero
covid_rate = covid_rate[order(FIPS,date)]
covid_rate[,":="(new_cases = c(NA,diff(cum_cases)),
                 new_deaths = c(NA,diff(cum_deaths)),
                 year = year(date),
                 month = month(date),
                year.month = round_date(date,"month")),by = FIPS]
covid.month = covid_rate[!is.na(new_cases)&!is.na(new_deaths),.(
  new_cases = sum(new_cases),
  new_deaths = sum(new_deaths),
  TotalPopEst2019 = TotalPopEst2019[length(TotalPopEst2019)]
),by = .(State,County,year,month)]
covid.month = covid.month[,.(
  new_cases = sum(new_cases),
  new_deaths = sum(new_deaths),
  TotalPopEst2019 = sum(TotalPopEst2019)
),by = .(State,year,month)]

covid.month[,monthly_death_per100k := new_deaths/TotalPopEst2019*100000]

Here we only give one example: 2020-9. The plots for all months are shown in (ii).

covid.death.plot.list = list()
setnames(covid.month,"State","state")
max_col = quantile(covid.month$monthly_death_per100k,1,na.rm = T)
min_col = quantile(covid.month$monthly_death_per100k,0,na.rm = T)
for (i in 2:12) {
  covid.death.plot.list[[i-1]] =
    plot_usmap(regions = "state",data = covid.month[year == 2020 & month == i],
               values = "monthly_death_per100k", exclude = c("Hawaii", "Alaska"),color = "black") + 
    scale_fill_distiller(
      palette = "Reds",direction = 1,
      name = "Number of New Covid Deaths per 100,000 People", 
      limits = c(min_col, max_col)) + 
    labs(title = paste0("New Covid Deaths, 2020-",i), subtitle = "Continental US States") +
    theme(legend.position = "right")
}

ggplotly(covid.death.plot.list[[8]])
# plot_usmap(regions = "state",data = covid.month[year == 2020 & month == i],
#                values = "monthly_death_per100k", exclude = c("Hawaii", "Alaska"),color = "black") + 
#     scale_fill_gradient(
#       low = "white", high = "red", 
#       name = "Number of New Covid Deaths per 100,000 People", 
#       label = scales::comma) + 
#     labs(title = paste0("New Covid Deaths, 2020-",i), subtitle = "Continental US States") +
#     theme(legend.position = "right")

2.3.2 Animations

# plotly
abbr = unique(us_map(regions =
"states") %>% select(abbr, full)) 

plotly.data = covid.month[year == 2020,.(month,state,monthly_death_per100k)] %>%
  mutate(hover =  paste(state, '<br>', 
                        'new covid deaths', round(monthly_death_per100k, 3))) %>% 
  left_join(abbr,by = c("state" = "full"))

# give state boundaries a white border
l <- list(color = toRGB("white"), width = 2)
# specify some map projection/options
g <- list(
  scope = 'usa',
  projection = list(type = 'albers usa'),
  showlakes = TRUE,
  lakecolor = toRGB('white'),
  exclude = c("")
)

fig <- plot_geo(plotly.data, locationmode = 'USA-states')
fig <- fig %>% add_trace(
    z = ~monthly_death_per100k, text = ~hover, locations = ~abbr,
    color = ~monthly_death_per100k, colors = 'Reds',
    zmin = min_col, zmax = max_col,frame = ~month
  )

fig <- fig %>% colorbar(title = "Monthly new covid deaths of state")
fig <- fig %>% layout(
    title = 'Monthly new covid deaths of state',
    geo = g,
    hoverlabel = list(bgcolor="white")
  )
fig
save.image("codes/hw3_covid.RData")
load("codes/hw3_covid.RData")

2.4 covid factor

county.covid = covid_rate[date == as.Date("2021-02-01"),.(
  FIPS,County,State,
  total_death_per100k = cum_deaths/TotalPopEst2019*100000
)] %>% 
  right_join(county_data, by = "FIPS") %>%
  mutate(log_total_death_per100k = log(total_death_per100k + 1))

county.covid.sub <- county.covid %>%
  select(log_total_death_per100k,State.x,County.x, FIPS, Deep_Pov_All, PovertyAllAgesPct, PerCapitaInc, UnempRate2019, PctEmpFIRE, PctEmpConstruction, PctEmpTrans, PctEmpMining, PctEmpTrade, PctEmpInformation, PctEmpAgriculture, PctEmpManufacturing, PctEmpServices, PopDensity2010, OwnHomePct, Age65AndOlderPct2010, TotalPop25Plus, Under18Pct2010, Ed2HSDiplomaOnlyPct, Ed3SomeCollegePct, Ed4AssocDegreePct, Ed5CollegePlusPct, ForeignBornPct, Net_International_Migration_Rate_2010_2019, NetMigrationRate1019, NaturalChangeRate1019, TotalPopEst2019, WhiteNonHispanicPct2010, NativeAmericanNonHispanicPct2010, BlackNonHispanicPct2010, AsianNonHispanicPct2010, HispanicPct2010, Type_2015_Update, RuralUrbanContinuumCode2013, UrbanInfluenceCode2013, Perpov_1980_0711, HiCreativeClass2000, HiAmenity, Retirement_Destination_2015_Update)

setnames(county.covid.sub,c("State.x","County.x"),c("state","county"))

Num of missings in every variable

# county.covid.sub[is.na(state)]
apply(is.na(county.covid.sub), 2, sum) # missing in state, county: all aggregate states, puerto rico, hawaii, alaska; bedford in Virginia has two obs, duplicated
##                    log_total_death_per100k 
##                                        171 
##                                      state 
##                                        171 
##                                     county 
##                                        171 
##                                       FIPS 
##                                          0 
##                               Deep_Pov_All 
##                                          6 
##                          PovertyAllAgesPct 
##                                         86 
##                               PerCapitaInc 
##                                          6 
##                              UnempRate2019 
##                                          7 
##                                 PctEmpFIRE 
##                                          7 
##                         PctEmpConstruction 
##                                          7 
##                                PctEmpTrans 
##                                          7 
##                               PctEmpMining 
##                                          7 
##                                PctEmpTrade 
##                                          7 
##                          PctEmpInformation 
##                                          7 
##                          PctEmpAgriculture 
##                                          7 
##                        PctEmpManufacturing 
##                                          7 
##                             PctEmpServices 
##                                          7 
##                             PopDensity2010 
##                                          6 
##                                 OwnHomePct 
##                                          6 
##                       Age65AndOlderPct2010 
##                                          6 
##                             TotalPop25Plus 
##                                          6 
##                             Under18Pct2010 
##                                          6 
##                        Ed2HSDiplomaOnlyPct 
##                                          6 
##                          Ed3SomeCollegePct 
##                                          6 
##                          Ed4AssocDegreePct 
##                                          6 
##                          Ed5CollegePlusPct 
##                                          6 
##                             ForeignBornPct 
##                                         85 
## Net_International_Migration_Rate_2010_2019 
##                                         85 
##                       NetMigrationRate1019 
##                                         85 
##                      NaturalChangeRate1019 
##                                         85 
##                            TotalPopEst2019 
##                                          6 
##                    WhiteNonHispanicPct2010 
##                                          6 
##           NativeAmericanNonHispanicPct2010 
##                                          6 
##                    BlackNonHispanicPct2010 
##                                          6 
##                    AsianNonHispanicPct2010 
##                                          6 
##                            HispanicPct2010 
##                                          6 
##                           Type_2015_Update 
##                                        136 
##                RuralUrbanContinuumCode2013 
##                                         57 
##                     UrbanInfluenceCode2013 
##                                         57 
##                           Perpov_1980_0711 
##                                        136 
##                        HiCreativeClass2000 
##                                        140 
##                                  HiAmenity 
##                                        172 
##         Retirement_Destination_2015_Update 
##                                        136
county.covid.sub = na.omit(county.covid.sub) # -174

2.4.1 lasso

set.seed(1)
model.var = model.matrix(log_total_death_per100k~.,data = county.covid.sub[,-c("FIPS","county")])[,-1] # no Alabama here
# head(model.test)
death = county.covid.sub[,log_total_death_per100k]
fit1.cv = cv.glmnet(model.var,death,alpha = 1, nfolds = 10, intercept = T,
                    penalty.factor = c(rep(0,48),rep(1,ncol(model.var)-48))) # alpha: the para in elastic net
coef.min = coef(fit1.cv,s="lambda.1se") # more parsimonious usually. If use min, maybe there is noise
# coef(fit1.cv,s="lambda.1min")
nonzero.coef = coef.min[which(coef.min!=0),]
# plot(fit1.cv)
nonzero.var = names(nonzero.coef)[-1]

2.4.2 Fine tune the model

Relaxed lasso result

data.selected = data.table(death,model.var[,which(colnames(model.var) %in% nonzero.var)])

fit.1se.lm = lm(death~.,data = data.selected)
# relaxed lasso
stargazer(fit.1se.lm,type=output_format, align=TRUE)
Dependent variable:
death
stateArizona 0.267
(0.238)
stateArkansas 0.056
(0.135)
stateCalifornia -0.771***
(0.159)
stateColorado -0.357**
(0.153)
stateConnecticut 0.178
(0.303)
stateDelaware -0.024
(0.469)
stateDistrict of Columbia 0.520
(0.807)
stateFlorida 0.031
(0.147)
stateGeorgia 0.066
(0.117)
stateIdaho -0.322*
(0.166)
stateIllinois 0.211
(0.132)
stateIndiana -0.074
(0.134)
stateIowa 0.287**
(0.134)
stateKansas -0.628***
(0.137)
stateKentucky -0.756***
(0.128)
stateLouisiana 0.375***
(0.143)
stateMaine -1.370***
(0.225)
stateMaryland -0.059
(0.196)
stateMassachusetts 0.077
(0.240)
stateMichigan -0.147
(0.135)
stateMinnesota -0.157
(0.138)
stateMississippi 0.264**
(0.132)
stateMissouri -0.378***
(0.129)
stateMontana 0.547***
(0.158)
stateNebraska -0.386***
(0.143)
stateNevada -0.577**
(0.227)
stateNew Hampshire -0.806***
(0.273)
stateNew Jersey 0.329
(0.209)
stateNew Mexico -0.653***
(0.192)
stateNew York -0.367**
(0.151)
stateNorth Carolina -0.368***
(0.127)
stateNorth Dakota 0.950***
(0.166)
stateOhio -0.523***
(0.134)
stateOklahoma -0.374***
(0.137)
stateOregon -0.865***
(0.174)
statePennsylvania 0.039
(0.146)
stateRhode Island -0.177
(0.372)
stateSouth Carolina -0.011
(0.153)
stateSouth Dakota 0.809***
(0.151)
stateTennessee 0.058
(0.130)
stateTexas 0.168
(0.127)
stateUtah -1.090***
(0.196)
stateVermont -2.140***
(0.239)
stateVirginia -0.476***
(0.123)
stateWashington -0.857***
(0.168)
stateWest Virginia -0.678***
(0.155)
stateWisconsin -0.134
(0.140)
stateWyoming 0.208
(0.205)
PovertyAllAgesPct 0.003
(0.005)
PerCapitaInc -0.00001
(0.00001)
PctEmpConstruction -0.049***
(0.007)
PctEmpMining -0.019***
(0.006)
PctEmpAgriculture -0.039***
(0.004)
PctEmpManufacturing 0.003
(0.003)
PopDensity2010 0.00003***
(0.00001)
Age65AndOlderPct2010 0.034***
(0.008)
Under18Pct2010 0.060***
(0.008)
Ed3SomeCollegePct -0.021***
(0.005)
Ed5CollegePlusPct -0.011***
(0.004)
NetMigrationRate1019 -0.012***
(0.003)
NaturalChangeRate1019 -0.044***
(0.010)
WhiteNonHispanicPct2010 -0.003*
(0.002)
HispanicPct2010 0.009***
(0.002)
Type_2015_Update -0.019**
(0.009)
Constant 4.530***
(0.417)
Observations 3,105
R2 0.359
Adjusted R2 0.345
Residual Std. Error 0.790 (df = 3040)
F Statistic 26.600*** (df = 64; 3040)
Note: p<0.1; p<0.05; p<0.01

BIC graphs

fit.final.1 =  regsubsets(death~.,data = data.selected,method = "exhaustive",
                          nvmax = ncol(data.selected)-1,force.in = c(1:48),really.big = T) # compared to Arizona

summary.fit.final.1 = summary(fit.final.1)
plot(summary.fit.final.1$bic)

opt.index = 10 # with bic

We choose \(p=10\) by BIC criteria.

Final model after fine tuning

bic.var.select = summary.fit.final.1$which[opt.index,-1]
bic.var = names(bic.var.select)[which(bic.var.select)] 
# bic.var
final.expr = as.formula(paste("death", "~", paste(bic.var, collapse = "+"))) 
fit.final.2 = lm(final.expr,data = data.selected)

stargazer(fit.final.2,type=output_format, align=TRUE)
Dependent variable:
death
stateArizona 0.208
(0.237)
stateArkansas 0.021
(0.134)
stateCalifornia -0.863***
(0.155)
stateColorado -0.434***
(0.150)
stateConnecticut 0.070
(0.299)
stateDelaware -0.099
(0.469)
stateDistrict of Columbia 0.657
(0.804)
stateFlorida -0.016
(0.142)
stateGeorgia 0.053
(0.116)
stateIdaho -0.399**
(0.163)
stateIllinois 0.107
(0.127)
stateIndiana -0.183
(0.128)
stateIowa 0.178
(0.129)
stateKansas -0.706***
(0.133)
stateKentucky -0.841***
(0.121)
stateLouisiana 0.371***
(0.141)
stateMaine -1.490***
(0.222)
stateMaryland -0.157
(0.191)
stateMassachusetts -0.005
(0.238)
stateMichigan -0.237*
(0.132)
stateMinnesota -0.268**
(0.133)
stateMississippi 0.309**
(0.131)
stateMissouri -0.459***
(0.123)
stateMontana 0.485***
(0.155)
stateNebraska -0.482***
(0.138)
stateNevada -0.647***
(0.226)
stateNew Hampshire -0.950***
(0.271)
stateNew Jersey 0.254
(0.203)
stateNew Mexico -0.725***
(0.190)
stateNew York -0.429***
(0.142)
stateNorth Carolina -0.387***
(0.126)
stateNorth Dakota 0.848***
(0.161)
stateOhio -0.632***
(0.129)
stateOklahoma -0.400***
(0.136)
stateOregon -0.932***
(0.172)
statePennsylvania -0.068
(0.140)
stateRhode Island -0.287
(0.370)
stateSouth Carolina -0.001
(0.152)
stateSouth Dakota 0.738***
(0.148)
stateTennessee -0.003
(0.128)
stateTexas 0.119
(0.126)
stateUtah -1.200***
(0.188)
stateVermont -2.280***
(0.236)
stateVirginia -0.516***
(0.121)
stateWashington -0.923***
(0.167)
stateWest Virginia -0.792***
(0.148)
stateWisconsin -0.252*
(0.136)
stateWyoming 0.106
(0.203)
PctEmpConstruction -0.057***
(0.007)
PctEmpMining -0.024***
(0.005)
PctEmpAgriculture -0.041***
(0.003)
Age65AndOlderPct2010 0.032***
(0.008)
Under18Pct2010 0.060***
(0.007)
Ed3SomeCollegePct -0.024***
(0.005)
Ed5CollegePlusPct -0.016***
(0.002)
NetMigrationRate1019 -0.014***
(0.002)
NaturalChangeRate1019 -0.040***
(0.010)
HispanicPct2010 0.011***
(0.002)
Constant 4.520***
(0.267)
Observations 3,105
R2 0.354
Adjusted R2 0.342
Residual Std. Error 0.792 (df = 3046)
F Statistic 28.800*** (df = 58; 3046)
Note: p<0.1; p<0.05; p<0.01
# all significant

Age65AnOlderPct2010 has a significantly positive coefficient. However, Under18Pct2010 also has a significantly positive coefficient and is larger than that for elderly. This does not give strong support to the argument that covid affects the elderly the most; we would rather interpret it as, covid affect the middle ages the least.

BlackPct is not in the regression after controlling for existing variables while HispanicPct is in the regression. The coefficient for it is significantly positive, indicating that a higher Hispanic percentage in the region is connected with a higher fatal rate of covid. The analysis gives some support on a higher fatal rate in Hispanic group; it is not very clear for the black group.

2.4.3 diagnosis

scatter.list = list()

for (i in 49:58) { # cannot do this. the plot is built only after it's invoked if in the loop? Use aes_string
  name = bic.var[i]
  scatter.list[[i-48]] = ggplot(data.selected,aes_string(x=name,y="death"))+
    geom_point()+
    geom_smooth(method = "lm")+
    xlab(name)+
    ylab("log.new.death")+
    theme_bw()
}

Scatter Plots

plot_grid(plotlist = scatter.list)

# hist(county.covid$total_death_per100k,breaks = seq(0,900,10))

Residual Plot

# residual plot
plot(fit.final.2,1)

QQ Plot

# qq plot
plot(fit.final.2,2)

Seems not a good fit. Homoscedasticity is not well satisfied from the residual plot; from the residual plot we can see that the residuals have much thicker tails than the normal distribution.

2.4.4 Summary

We observe that controlling for other factors, a couple of the state predictors are statistically significant. In particular, we observe that states such as California, Colorado, Idaho, Kansas, Maine, Missouri, Nebraska, etc., have significant negative correlations. On the other hand, states like Mississippi and Louisiana all have significant positive correlations. The States with a higher COVID death rate tend to be Southern states with looser COVID restrictions and mask mandates. Despite having significantly higher population density, many States in the Northeast, such as Massachusetts and Vermont, fared better. This potentially suggests a strong case for COVID intervention. The government should continue to impose mask mandates and lockdown measures if a significant spike in cases happens in the future.

2.4.5 Improvements

From the scatter plots presented above, we can see that there are a lot of counties with zero in log total covid death per 100k and they are distant to other observations. Therefore, we may consider mixture models (for example, zero-inflated model) and classify them into two groups first, then make inference within every group.

As with the possible important variables, medical conditions could be one of them. Also total number of death per 100k may not be a perfect measure for our goal because it contains a part of randomness, i.e., the spread of covid in an area can have some random determinants; the virus might unexpectedly break out in some areas and result in a higher infection rate and mortality rate. We may want to complement the analysis with another dependent variable like \(\frac{TotalDeaths}{TotalCases}\).

2.4.6 Possible Imputations

Missing values are clustered in Puerto Rico, Alaska. These states are different in property with continent states so we may not trust the imputation results.